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ABSTRACT 


Laboratory cultures of heterotrophic protists are often xenic, meaning that the culture 
contains more than one microbial organism. In this study, we analyzed genome-assembly data 
from cultures of four marine protist flagellates—the marine malawimonad /masa heleensis, the 
undescribed mantamonad strain SRT-306, the discobid Ophirina amphinema, and the cryptist 
Palpitomonas bilix—specifically to search for genomes of cocultured bacteria. As no external 
bacteria have been added to the protist stock cultures, it is probable that the cocultured bacteria 
came from the original water samples from which the protists were isolated. At least some of 
these bacteria are consumed as a food source by the protists, all of which are obligate hetero- 
trophs. From four separate metagenomic de novo assemblies for these mixed cultures, we iden- 
tified 28 scaffolds, which BUSCO analyses suggest represent complete or near-complete 
bacterial genomes. These scaffolds range in length from 3,139,436 to 6,090,282 bp and encode 
2873 to 5666 genes. Only eight of the 28 scaffolds corresponded to entries in the NCBI genome 
database, meaning that 20 of these scaffolds represent genomes from putatively novel bacterial 
species. Our findings highlight that data like these, which are often discarded or overlooked, 


can be a source of novel genomes and/or species. 
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INTRODUCTION 


As part of ongoing protist genome projects, we have generated large-scale genomic data 
from several recently discovered free-living heterotrophic flagellates, including the cryptist 
Palipitomonas bilix (Yabuki et al., 2010; 2014), the undescribed mantamonad strain SRT-306 
(Glucksman et al., 2011; Brown et al., 2018), the phylogenetically deep-branching discobid 
Ophirina amphinema (Yabuki et al., 2018), and the marine malawimonad strain /masa heleen- 
sis (Heiss et al., 2020). All of these protists have been recognized in the last decade as repre- 
senting deeply branching lineages amongst all eukaryotes (Yabuki et al., 2010; Glucksman et 
al., 2011; Brown et al., 2018; Yabuki et al., 2018), and their genomic and cell structural data 
potentially have significant implications for better understanding early eukaryotic evolution, 
including the nature of the last eukaryotic common ancestor (LECA). Moreover, these microbes 
are phagotrophic nanoflagellates, an ecological group that plays an important role as grazers 
of bacteria (Collier and Rest, 2019). As each protist species investigated here is an obligate 
heterotroph, bacteria, presumably coisolated together with the protist species, are consumed 
by the protist as a necessary food source. However, the identity and community composition 
of the cocultured bacteria have been unknown. 

Despite efforts to reduce bacterial load in DNA extraction, such as by physical separation 
using polycarbonate membrane filters before harvesting, the bacteria could not be completely 
removed, and their genomic sequences became a major source of contamination in the result- 
ing data, often accounting for the majority of the generated reads. Therefore, the bacterial 
sequences had to be identified and separated from the eukaryotic reads during the process of 
protist genome assembly and annotation. 

Studies in other systems have shown that there is value in nontarget sequencing data. For 
example, genomic and transcriptomic datasets generated from animal tissues can contain 
sequences originating from parasites and/or symbionts of those organisms. Classifying and 
identifying these sequences can shed light on the distribution of particular pathogens in animal 
populations (Lopes et al., 2017; Galen et al., 2020). In eukaryote genome projects like ours, 
these bacterial data, which could be valuable to the microbial research community, are often 
discarded without being assembled, annotated, and archived at a public data repository. How- 
ever, these data may contain new genomes and/or species, and depending on sampling, cultur- 
ing, and DNA extraction conditions, they could reveal population dynamics of the sampled 
habitat or the culture system. Here, we analyzed such “contaminating” bacterial data: from 
these four protist cultures, we assembled, annotated, and identified 28 scaffolds representing 
complete or near-complete bacterial genomes. Our analyses show that the majority of these 
bacterial scaffolds represent species that are not currently represented in GenBank’s bacterial 
genome databases. 


MATERIALS AND METHODS 


CULTURE CONDITIONS: The culture strain of Palpitomonas bilix investigated in this study 
was established from a driftwood sample from Marcharchar Island, Palau, collected in June 
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2006 (Yabuki et al., 2010). The malawimonad /masa heleensis and the discobid Ophirina 
amphinema were collected in September 2013 and March 2016, respectively, from a shallow 
lagoon on Nusa Lavata in the Hele island chain, in the Western Province of the Solomon 
Islands (Yabuki et al., 2018; Heiss et al. 2020). Mantamonad strain SRT-306 came from the 
scraped surface sample of a barracuda that was caught in a lagoon in Iriomote-jima, Okinawa 
Prefecture, Japan, in September 2013. All of the cultures grew well and were maintained in 
Erd-Schreiber medium (ESM: UTEX) or ESM fortified with 2.5% Cerophyl medium (ATCC 
802) at 23° C, with the exception of SRT-306, which was maintained at 16° C. 

DNA EXTRACTION AND SEQUENCING: For /. heleensis and O. amphinema, DNA was 
extracted on multiple occasions between January 2018 and May 2019. Cell cultures were 
scraped using a sterile cell scraper to lift adherent cells, and protist cells were collected on 0.6 
um and 0.4 um Millipore polycarbonate filters, respectively, under partial vacuum, to reduce 
bacterial load. Cells were incubated on these filters for 3 hours at 56° C in lysis buffer from the 
Qiagen MagAttract HMW DNA Kit (Qiagen, Hilden, Germany). DNA was extracted from the 
lysates using this same kit according to the manufacturer's instructions. An aliquot of DNA (1 
ug) from each culture was sent to the New York Genome Center for Illumina Paired End 2x150 
bp sequencing on the Illumina HiSeqX platform (Illumina, San Diego, CA). Additional DNA 
(~1-4 ug) was prepared for sequencing on the Oxford Nanopore platform using either the 
SQK-LSK108 or SKK-LSK109 Genomic DNA by Ligation kit (Oxford Nanopore Technologies, 
Oxford, UK) according to manufacturer’s instructions. Libraries were sequenced on the Min- 
ION platform using FLO-MIN106 SpotON R9 Flow Cells. 

For P. bilix and mantamonad strain SRT-306, DNA was extracted using the Qiagen DNeasy 
Blood & Tissue Kit, following the manufacturer's instructions. Cells were collected onto 0.8 um 
filters, as specified above, and washed three times each with 5-10 ml artificial seawater to 
reduce the bacterial load prior to DNA extraction. The plates on which SRT-306 were grown 
were scraped to lift adherent cells; no such procedure was required for the free-swimming P. 
bilix. DNA samples were sent to Cornell Sequencing Core and New York Genome Center for 
Illumina Nextera library preparations and Paired End 2x150 bp sequencing on the on the 
Illumina HiSeq2500 platform. 

BASE CALLING AND ASSEMBLY: Raw MinION FastQ files were base-called using Guppy v1.1 
(Oxford Nanopore Technologies). Hybrid assemblies using both MinION and Illumina data 
from /. heleensis and O. amphinema were assembled using MaSuRCA v3.2.6 (Zimin et al., 
2013). Illumina reads generated from cultures of P. bilix and SRT-306 were assembled using 
ALLPATHS-LG (Gnerre et al., 2011; Ribeiro et al., 2012). 

IDENTIFICATION OF BACTERIAL GENOMES FROM METAGENOMIC ASSEMBLIES: The resulting 
assembled scaffolds and contigs were individually analyzed for completeness using Benchmark- 
ing Universal Single-Copy Orthologs (BUSCO) v3 (Simao et al., 2015) with the “Bacteria odb9” 
dataset. The presence of 5S, 16S, and 23S rRNA genes was searched for using rnammer v1.2 
(Lagesen et al., 2007). The scaffolds both (a) with a BUSCO completeness =80% and (b) that 
contained at least one complete rRNA operon were retained as “complete” bacterial scaffolds. 
The 80% completeness threshold was chosen because there was a sharp cutoff in completeness 
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TABLE 1. Features of the four metagenomic assemblies. 
Collection mes 
Name of Protist Culture Sampling Location ven Extraction Reference 
Year 
SRE se undescribed Iriomote-jima, Japan 2013 2015 This study 
mantamonad) 
Palpitomonas bilix Macharchar Island, Republic of Palau 2006 2013-2014  Yabuki et al., 2010 
Imasa heleensis isa baat iclenslene chai 2013 2019 Heiss et al., 2020 
Solomon Islands 
Ophaannenee Peete oS seen 2016 2019 Yabuki et al., 2018 


Solomon Islands 


for scaffolds below 80%, most of which were less than 60% complete and/or did not contain 
complete rRNA operons. In an attempt to circularize the complete scaffolds, two independent 
approaches were taken: MUMmer v3.0 (Kurtz et al., 2004) was used to look for overlaps at the 
ends of each scaffold, and Circlator (Hunt et al., 2015) was used to attempt circularization of 
each scaffold assembly. 

ANNOTATION AND CLASSIFICATION OF BACTERIAL GENOMES: Bacterial scaffolds were anno- 
tated for gene content using the rapid prokaryotic genome annotation tool, Prokka (Seemann, 
2014). The scaffolds were classified into taxonomic groups using the Genome Taxonomy Data- 
base Tool Kit (GTDB-TK) version 0.3.1 (Chaumeil et al., 2019). Scaffolds were compared to 
each other using the online ANI calculator tool from the EZ BioCloud database (Yoon et al., 
2017). The scaffolds are available on NCBI under the BioProject accession code PRJNA619388. 

ABUNDANCE ESTIMATES: Relative abundance estimates were calculated for each complete 
genome by aligning the Illumina Paired End reads to the pooled set of complete bacterial 
genomes isolated from the same culture using the Burrows-Wheeler Aligner (BWA) (Li and 
Durbin, 2009; Durbin, 2010), retaining only the uniquely mapping read pairs. Read pairs map- 
ping to each scaffold were calculated using SAMtools version 1.9 (Li et al., 2009; Li, 2011). Final 
calculations and plots were made in R version 3.3.3 and ggplot2 version 3.3.2 (Wickham, 2016). 


RESUWEES 


IDENTIFICATION OF BACTERIAL GENOMES FROM METAGENOMIC ASSEMBLIES: We screened 
mixed assemblies from four mono-isolate cultures of marine protists that were sequenced for 
the purpose of assembling high-quality reference genomes for the protist species (table 1). We 
identified 28 scaffolds that represent complete or near-complete bacterial genomes: four from 
SRT-306, three from P. bilix, six from O. amphinema, and 15 from I. heleensis (fig. 1, table 2). 
We attempted to circularize all genomes by looking for overlaps at the ends of the scaffolds by 
aligning each scaffold to itself and plotting the overlaps using MUMmer (Kurtz et al., 2004), 
and by running each scaffold through the Circlator assembly circularization pipeline (Hunt et 
al., 2015). MUMmer found no overlapping regions in any scaffold, and Circlator did not cir- 
cularize any of the 28 scaffolds (data not shown). All but one of the 28 scaffolds contained at 
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least one complete prokaryotic rRNA operon (i.e., with full-length 
58, 168, and 23S rRNA; table 2). For scaffold CP051235, only one 
incomplete rRNA operon was annotated, which contained full- 
length 16S and 23S rRNA genes separated by two tRNA loci, but 
no downstream 5S rRNA. The 16S and 23S rRNA genes are located 
midscaffold on CP051235, and genes are annotated both upstream 
of the 16S and downstream of the 23S loci; thus, the lack of an 
annotated 5S rRNA is not due to an assembly truncation. 

FEATURES AND GENE CONTENT OF EACH BACTERIAL SCAF- 
FOLD: We used a rapid annotation tool, Prokka (Seemann, 2014), 
to annotate each bacterial scaffold for gene content and other basic 
genomic features (table 3). The 28 bacterial scaffolds ranged in 
length from 3.1 to 6.0 Mb, contained 2873 to 5666 annotated 
genes, and had an average of 44.5 tRNAs, congruent with what is 
typical of bacterial genomes (Land et al., 2015). GC content of 
these genomes ranged from 37.4 to 65.8%, within the range noted 
for bacterial genomes (Almpanis et al., 2018). 

TAXONOMIC CLASSIFICATION OF THE 28 SCAFFOLDS: We used 
the GTDB-TK tool to classify each genome taxonomically (Chau- 
meil et al., 2019). From the 28 scaffolds we found three phyla, five 
classes, eight orders, 10 families, and 21 genera represented (table | 
3). Eight of the scaffolds were classified to the species level. Two of pig 1. Light microscopy 
these genomes represent the same taxon, Alcanivorax sp. DSM _ images showing the four pro- 
26293 (GenBank Assembly Accession GCF_900107995.1), which is tist cells mentioned in this 
present in data from the cultures of O. amphinema and I. heleensis. ney ake ties pretist 

cell, bac = bacteria. Scale bar = 
These two scaffolds are close to identical at the sequence level, with 5 jm. A. Palpitomonas bilix, 
an average nucleotide identity (ANI) of 99.99%. For the remaining B. mantamonad strain SRT- 
20 scaffolds, 19 were classified to the genus level and one (CP051235) 306; C. Ophirina amphinema; 
was Classified only to the family level. As such, these 20 scaffolds a aes 
could represent either novel or known but unsequenced bacterial 
species. GTDB-TK establishes taxonomy using a combination of the placement of the query into 
a reference tree, the queries’ Relative Evolutionary Distance (RED), and its ANI percent values 
compared to reference genomes (Matsen et al., 2010; Parks et al., 2018; Chaumeil et al., 2019). 
Queries placed on terminal branches of the reference tree (i.e., are sister to individual known 
species or strains) are placed in the genus of their sister taxon. A query is classified to species 
level if it is placed in an existing genus and its ANI value is within the circumscription radius 
(typically, 95%) of the closest species; otherwise the query is classified as a novel species in that 
genus (Chaumeil et al., 2019). For example, scaffold CP051248 had an ANI value of 99.94% and 
was classified as Pseudooceanicola atlanticus. In contrast, scaffold CP051241 was placed under 
the genus Marinobacter. Its closest ANI value, to the species Marinobacter similis, was 89.97%, 
suggesting that it is a novel species of Marinobacter. For queries that are placed on internal 
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TABLE 2. Features of each bacterial genome. Genomes are organized by protist culture of origin. Asterisks 
indicate different assemblies of the same bacterial strain. 


BUSCO 
Giltute Bacterial Scaffold Length GC Con- Genes tRNA rRNA Com- Goverabe 
Assembly ID (base pairs) tent (%) (no.) (no.) (no.) — pleteness 
(%) 
CP051240 3961866 58.3 3465 44 9 98 64.19 
CP051241 4272096 Se gts 3872 52 9 98.6 6995.73 
CP051246 4734807 37.4 4051 45 9 952 406.72 
Ophirina amphinema 
CP051249 3255866 59.1 3101 42 3 87.2 36.68 
CP051253 3835116 58.5 3501 42 6 ae} 138.91 
CP051254 3573046 58.1 3409 42 3 96.6 40.01 
CP051228 4907657 62.93 4474 51 6 96.6 1042.67 
CP051229 3398407 64.06 3469 44 4 82.4 1214 
CP051230 3829003 653 3650 42 b Les 169.94 
CP051232 4675334 65.78 4510 50 6 95.9 56.15 
CP051234 4470904 58.52 3037 50 S 95,3 1216.04 
CP051235 3307379 60.77 3227 47 2 96.6 S23 
CP051237 6060643 59.7 5666 45 6 98 290.56 
Imasa heleensis CP051239 3139436 64.53 2873 36 6 88.5 120.15 
CP051242 3747071 38.28 3174 39 6 95.3 18.43 
CP051243 3439134 44.43 3006 36 6 94.6 1371.79 
CP051245 4052158 41.12 3702 39 6 kaye! 108.24 
CP051247 6090282 57.87 4652 53 3 85.8 175.53 
CP051248 4533893 64.23 4245 47 6 952 151.18 
CP051250 4421565 57.26 4018 a2 9 99.3 953.62 
CP051252 3837393 58.52 3500 43 6 O73 62.26 
CP051231 4309625 65.55 4215 43 5 95.9 95.3 
CP051233 3386480 62.39 3248 46 3 OZ 60.62 
SRT-306 
CP051238 4744784 47.64 4176 46 3 85.1 150.07 
CP051251 3496013 64.62 3297 40 E 89.2 77.19 
CP051236 3707127 60.13 3496 49 7 O79 351.93 
Palpitomonas bilix CP051244 3755467 43.86 3308 40 6 90.5 18.82 
CPQ51255 3958902 38.18 3177 37 6 919 20.98 
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Palpitomonas bilix 


Mantamonad str. SRT-306 - 


Ophirina amphinema - 





Imasa heleensis 





0 5 10 15 
Count of bacterial scaffolds 


FIG. 2. Count of scaffolds in each metagenome that are >80% complete or more, as determined by BUSCO. 


branches, taxonomy is instead determined by placement and/or RED (Parks et al., 2018; Chau- 
meil et al., 2019). Only one of our queries, scaffold CP051235 (which was the scaffold lacking the 
5S rRNA gene), was not placed in an existing genus. This scaffold was placed in the family 
Hyphomicrobiaceae, and potentially represents a new genus therein. 

RELATIVE ABUNDANCE OF BACTERIAL SPECIES IN EACH CULTURE: We next evaluated the 
relative abundance of the 28 bacterial scaffolds by calculating the number of Illumina read pairs 
that aligned uniquely to each genome from the four sequencing libraries (fig. 2). Two of the 
metagenomes, from O. amphinema and P. bilix cultures, were each dominated by a single 
bacterial taxon, Marinobacter and Nitratireductor respectively, that accounted for ~90% of the 
bacterial data from each library. The metagenomic data from the /. heleensis and SRT-306 
cultures were not dominated by any one species (fig. 2). 


DISCUSSION 


In this study, we investigated bacterial data that are often discarded in the course of 
sequencing genomes from xenic protist cultures. From four metagenomic protist genome 
assemblies, we identified 28 assembly scaffolds that represent complete or near-complete bacte- 
rial chromosomes. 

We were able to classify eight of the bacterial scaffolds to species level, 19 to genus level, 
and one to family level. Only two of these 28 bacterial scaffolds represented the same species, 
Alcanivorax sp. DSM 26293, which was present in the O. amphinema and I. heleensis cell 
culture metagenomes, making a total of 27 unique bacterial scaffolds from our dataset. Ophi- 
rina amphinema and I. heleensis were originally isolated from the same lagoon, so this Alca- 
nivorax bacterium is possibly from this original source. 
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TABLE 3. Taxonomic classification of bacterial genomes. Results given by GIDB-TK database. “NA” indi- 
cates scaffolds for which no closest hit was given. Asterisks (*) indicate placeholder family and genus names 
provided by GTDB-TK when no family and/or genus name is present in NCBI database. 


Bacterial é ; Ase 
Assembly Family Genus Species Species Name/ID, Closest Hit (% ANI) 
CP051228 GCA-2696645 * GCA-2696645 * - NA 
CP051229 Hyphomonadaceae UBA7672 * - NA 
Hyphomonas atlantica/ 
CP051230 Hyphomonadaceae Hyphomonas - GCA__000682715.1 (83.4) 
CP051231 Rhodobacteraceae Oceanicola - Oceanicola litoreus! 
GCA_900142295.1 (85.71) 
eee Maritimibacter sp./ 
CP051232 Rhodobacteraceae Maritimibacter - GCA_002701395.1 (87.84) 
: Rhodobacteraceae bacterium|/ 
CP051233 Rhodobacteraceae Thalassobius - GCA__002708925.1 (80.76) 
Alphaproteobacteria bacterium BRH_ 
CP051234 Hyphomicrobiaceae Filomicrobium - c36/ 
GCA_001516065.1 (78.08) 
CP051235 Hyphomicrobiaceae - - NA 
ee Nitratireductor sp./ 
CP051236 Phyllobacteriaceae _ Nitratireductor - GCA_002697745.1 (89.84) 
mate id atte Pararhizobium haloflavum!/ 
CP051237 Rhizobiaceae Pararhizobium - GCA_002750855.1 (77.4) 
- Aestuariibacter aggregatus/ 
CP051238 Alteromonadaceae __ Aestuariibacter - GCA_900129565.1(87.13) 
, Alcanivorax sp. UBA2685/ 
CP051239 Alcanivoracaceae Alcanivorax - GCA_002354605.1 (94.97) 
: ; Alcanivorax nanhaiticus/ 
CP051240 Alcanivoracaceae Alcanivorax _ GCA_000756665.1 (84.13) 
‘ Marinobacter similis/ 
CP051241 Alteromonadaceae Marinobacter - GCA__000830985.1 (89.97) 
Balneolaceae bacterium UBA7797/ 
* a: 
CP051242 Balneolaceae UBA7797 GCA_002480645.1 (76.65) 
Flavobacteriales bacterium UBA7878/ 
* 7 
CP051243 Cryomorphaceae UBA7878 GCA_002501205.1 (78.02) 
CP051244 Flavobacteriaceae Muricauda = NA 
CP051245 Flavobacteriaceae Muricauda - NA 
CP051246 Flavobacteriaceae — Salegentibacter - NA 
Planctomycetaceae bacterium UBA2671/ 
* _ 
CP051247 Planctomycetaceae UBA9033 GCA_002359185.1 (76.63) 
; Pseudooceanicola Pseudooceanicola atlanticus/ 
CP051248 Rhodobacteraceae Pseudooceanicola ape. GCA_000768315.1 (99.94) 
’ ' Epibacterium Epibacterium mobile/ 
CP051249 Rhodobacteraceae —_ Epibacterium mobile GCA_001681715.1 (96.59) 
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CP051250 Alteromonadaceae —_ Marinobacter ages NG En a i eee 3) 
CP051251 Rhodobacteraceae —_ Pseudooceanicola iat a ayagen pL 
CP051252 Alcanivoracaceae Alcanivorax eae ae Ae aN Rea a 
CPOS1253 —Aleanivoracaceae—Aleanivorax eM g363 PGA 9901079951 (98.96) 
CP051254 Hyphomonadaceae Hyphomonas ee ce PCa ie 11) 
CP051255 Balneolaceae Balneola ae F ae ; 2 Raed hen 


A literature search was done of the top taxonomic hits for each genome to ascertain the 
typical environmental ranges of the source organisms. Salegentibacter is a genus found in 
hypersaline lakes, on the surfaces of marine fauna, and in marine sediments (McCammon 
and Bowman, 2000; Bowman, 2016). Hyphomonas (Abraham, 2020), Epibacterium (Wirth 
and Whitman, 2020), Maritimibacter (Lee et al., 2007), Aestuariibacter (Yi et al., 2004), 
Thalassobius (Arahal et al., 2005), Muricauda (Bruns et al., 2001; Bruns and Berthe-Corti, 
2015), Nitratireductor (Singh et al. 2012), Balnenola (Urios et al., 2006), Oceanicola (Cho 
and Giovannoni, 2004), Pseudoceanicola (Lai et al., 2015), Planctomycetaceae (Ward, 2015), 
and Balneolaceae (Xia et al., 2016) represent groups of widely dispersed marine bacteria 
isolated from seawater and/or marine sediments or surfaces. Alcanivorax (Yakimov et al., 
1998; Golyshin et al., 2015), Marinobacter (Gauthier et al., 1992; Bowman and McMeekin, 
2015), Hyphomicrobiaceae (Kesy et al. 2019), and Filomicrobium (Schlesner, 1987; 2015) are 
marine groups previously isolated from seawater and seawater sediments enriched with 
crude oil and/or hydrocarbons. Nitratireductor is a nitrate-reducing genus found in various 
marine habitats (Labbe et al., 2004), members of which have previously been isolated from 
diatom cultures (Jang et al., 2011). The family Cryomorphaceae is present within a wide 
range of nonextreme ecosystems, both marine and terrestrial (Bowman, 2015). Lastly, Para- 
rhizobium is a nitrogen-fixing genus that associates with plant roots, which has a worldwide 
distribution (Mousavi et al., 2015). 

In summary, all but one of the described groups (Pararhizobium) are characteristically 
found in marine habitats, consistent with the fact that the cultures are from marine environ- 
ments. These cocultured bacteria therefore are likely from the source samples from which each 
protist was isolated. New microbial genomic data generated from little-sampled habitats has 
the potential to be highly valuable to the research community, as studies have shown that 
marine and other infrequently sampled niches may contain more novel microbial diversity than 
habitats that are much closer at hand, like the human gut or local soils (Bech et al., 2020; Tes- 
sler et al., 2017). However, we cannot rule out that they are environmental contaminants, 
considering that most of the taxa have global marine distributions, the seawater culture medium 
would select for marine species, and our lab in Manhattan is surrounded by seawater and 
bacteria are commonly dispersed through the air and on surfaces (Mayol, 2017). 
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Bacterial assembly ID (taxonomy) 
% length normalized read pairs 


NY) CP051244 (Muricauda) 4.80% 

CP051255 (Balneola sp. EnC07) 5.36% 
- CP051236 (Nitratireductor) 89.84% 
~ CP051233 (Thalassobius) 15.82 % 
Mi CP051251 (Pseudooceanicola nitratireducens) 20.15% 
MM CP051231 (Oceanicola) 24.87% 
MW CP051238 (Aestuariibacter) 39.16% 
MM CP051249 (Epibacterium mobile) 0.48% 
MM CP051254 (Hyphomonas atlantica) 0.52% 
§§ CP051240 (Alcanivorax) 0.84% 
MW CP051253 (Alcanivorax sp. DSM 26293) 1.81% * 
MM) CP051246 (Salegentibacter) 5.29% 
MM CP051241 (Marinobacter) 91.06% 
) CP051229 (UBA7672) 0.21% 
MM) CP051242 (UBA7797) 0.32% 

( 
( 
( 
( 
( 
( 
( 
( 
( 
( 
( 
( 
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pairs 


0.75 


normalized read 
=) 
Ol 
(jo) 













Mi CP051235 (Hyphomicrobiaceae) 0.56% 

~ CP051232 (Maritimibacter) 0.97% 

MB CP051252 (Alcanivorax sp. DSM 26293) 1.08% * 
* CP051245 (Muricauda) 1.87% 

CP051239 (Alcanivorax) 2.08% 

CP051248 (Pseudooceanicola atlanticus) 2.62% 
Mi CP051230 (Hyphomonas) 2.94% 

" CP051247 (UBA9033) 3.04% 


Proportion length 
(2) 
I) 
O1 





0 ) CP051237 (Pararhizobium) 5.03% 
CP051250 (Marinobacter adhaerens) 16.50% 
P. bilix O. amphinema BN CP051228 (GCA-2696645) 18.04% 
Mantamonad SRT-306 I. heleensis ~.. CP051234 (Filomicrobium) 21.04% 


WI CP051243 (UBA7878) 23.73% 


FIG. 3. Relative abundance of each bacterial scaffold in Illumina sequencing libraries, represented as propor- 
tion of length normalized read pairs for the portion of each sequencing library mapping to the bacterial 
scaffolds. The two Alcanivorax sp. DSM 26293 scaffolds are marked with asterisks (*). 


When we calculated the relative abundance of each bacterial scaffold across the four cul- 
tures, we found that the species are unevenly represented in the sequencing libraries, with both 
Ophirina amphinema and Palpitomonas bilix having one dominant bacterial taxon, Marino- 
bacter and Nitratireductor respectively, while the novel malawimonad and mantamonad strains 
each had a more even distribution of bacterial taxa (fig. 2). This distribution may represent 
population patterns in the source cultures possibly caused by varying bacterial diversity in the 
original samples or by differing grazing preferences of each isolated protist in culture. However, 
each culture was preprocessed using a polycarbonate membrane filter prior to DNA extraction, 
which may have influenced the distribution of each species, by retaining both larger bacterial 
cells and bacterial flocs. Therefore, the bacterial abundances presented in figure 2 are not neces- 
sarily representative of the bacterial community of each culture. 

Importantly, 20 of the 28 scaffolds we identified represent taxa that are not yet present in 
bacterial genome databases. As mentioned above, these are data that we usually discard from 
our protist genomics projects, but we find that they can also be a source of novel bacterial 
genomes and species. We note as well that we obtained these unidentified species from long- 
term cultures grown under constant environmental conditions and containing relatively few 
different organisms. This suggests that establishing axenic cultures of at least some of these 
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novel bacteria might be practicable, thereby allowing for their formal description. However, 
the existence of axenic cultures may not remain the only requirement for formal description. 
A recent consensus statement (Murray, 2020) suggests mechanisms for the establishment of 
new prokaryote taxa based on genomic data. Should this consensus be formally accepted, all of 
our new taxa should be candidates for formal description. In either case, we find that the pro- 
karyotic genomes generated during eukaryote sequencing projects need not be disposable by- 
products. Instead, we suggest they are a key source for potentially important new discoveries. 
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